An Intonational Phrase Boundary and Pitch Accent Dependent Speech Recognizer

نویسندگان

  • Ken Chen
  • Mark Hasegawa-Johnson
  • Sung-Suk Kim
چکیده

Does prosody help word recognition? In this paper, we propose a novel probabilistic framework in which word and phoneme are dependent on prosody in a way that improves word recognition. We describe the idea of prosody dependent speech recognition by building a prosody dependent speech recognizer that conditions word and phoneme models on two important prosodic variables: intonational phrase boundary and pitch accent. It is known that intonational phrase boundaries induce salient lengthening to the phrase-final speech units, while pitch accents induce distinct pitch variation on the accented syllables. Effective prosody-discriminative hidden Markov models (HMMs) can be built by conditioning on prosody only a small subset of HMM distributions: the duration PDFs and the acoustic-prosodic observation PDFs. The prosody dependence of the acoustic-phonetic observation PDFs is ignored in our investigation, resulting in a prosody dependent recognizer that has a good trade-off between performance and parameterization. To accurately model the duration of the prosody dependent allophonic models, explicit duration hidden Markov model (EDHMM) is used for both training and decoding. A new acoustic-prosodic feature, transformed from the normalized F0 contour by an artificial neural network (ANN), is incorporated into the acoustic feature vector for the acoustic modeling of accent induced pitch variation. This prosody dependent speech recognizer is able to improve word recognition accuracy by an absolute 1.8% over prosody independent recognizers on the Boston University Radio News Corpus which is prosodically transcribed using ToBI labeling system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Communication Session 4pSCb: Production and Perception I: Beyond the Speech Segment (Poster Session) 4pSCb49. Towards a model of intonational phonology of Turkish: Neutral intonation

This study proposes an Autosegmental-Metrical model of Turkish intonation based on sentences produced in neutral focus, as part of our ongoing research investigating Turkish intonational phonology. Tonal patterns of utterances were examined by varying the length of a word and a phrase, the location of stress, syntactic structures, and sentence types. Preliminary results suggest that Turkish has...

متن کامل

A Maximum Likelihood Prosody Recognizer

Automatic prosody recognition (APR) is of fundamental importance for automatic speech understanding. In this paper, we propose a maximum likelihood prosody recognizer consisting of a GMM-based acoustic model that models the distribution of the phone-level acoustic-prosodic observations (pitch, duration and energy) and an ANN-based language model that models the word-level stochastic dependence ...

متن کامل

The Intonational Structure of Chickasaw

This paper describes the principal features of the intonational system of Chickasaw, a Muskogean language spoken in southcentral Oklahoma. Results of the study are as follows. The Intonational Phrase (IP) consists of one or more Accentual Phrases (AP) which can be larger or smaller than a word. The underlying tonal pattern of the AP is [LHHL]. Chickasaw statements are characteristically marked ...

متن کامل

آهنگ ساخت ندایی در فارسی

موضوع مقالة حاضر آهنگ ساخت ندایی در فارسی است و در آن سه نوع ندا[1] توصیف و تحلیل شده‌اند: «ندای عادی»، ندا در حالت عصبانیت «ندای خشمگین» و ندا در حالت تعجب «ندای شگفت‌زده». توصیف این سه گونه با استفاده ازضبط صدای چهار گویش‌ور و تحلیل آن در چارچوب واج‌شناسی لایه‌ای[2] انجام شده است. نظام نمایش آهنگ در این مقاله نظام نواخت و فاصله‌نما[3] می‌باشد. نتایج به دست آمده نشان می‌دهد که سه گونة مذکور دا...

متن کامل

Pitch accent distribution in German infant-directed speech

Infant-directed speech exhibits slower speech rate, higher pitch and larger f0 excursions than adult-directed speech. Apart from these phonetic properties established in many languages, little is known on the intonational phonological structure in individual languages, i.e. pitch accents and boundary tones and their frequency distribution. Here, we investigated the intonation of infant-directed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003